Learning Intermediate-Level Representations of Form and Motion from Natural Movies
نویسندگان
چکیده
We present a model of intermediate-level visual representation that is based on learning invariances from movies of the natural environment. The model is composed of two stages of processing: an early feature representation layer and a second layer in which invariances are explicitly represented. Invariances are learned as the result of factoring apart the temporally stable and dynamic components embedded in the early feature representation. The structure contained in these components is made explicit in the activities of second-layer units that capture invariances in both form and motion. When trained on natural movies, the first layer produces a factorization, or separation, of image content into a temporally persistent part representing local edge structure and a dynamic part representing local motion structure, consistent with known response properties in early visual cortex (area V1). This factorization linearizes statistical dependencies among the first-layer units, making them learnable by the second layer. The second-layer units are split into two populations according to the factorization in the first layer. The form-selective units receive their input from the temporally persistent part (local edge structure) and after training result in a diverse set of higher-order shape features consisting of extended contours, multiscale edges, textures, and texture boundaries. The motion-selective units receive their input from the dynamic part (local motion structure) and after training result in a representation of image translation over different spatial scales and directions, in addition to more complex deformations. These representations provide a rich description of dynamic natural images and testable hypotheses regarding intermediate-level representation in visual cortex.
منابع مشابه
Implications of News Segments and Movies for Enhancing Listening Comprehension of Language Learners
Abstract Armed with technological development, the present study aimed at gauging the effectiveness of exposure to news and movies as two types of audiovisual programs in improving language learners’ listening comprehension at the intermediate level. To this end, a listening comprehension test was administered to 108 language learners and finally 60 language learners were selected as intermedia...
متن کاملLearning Transformational Invariants from Time-Varying Natural Images
We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After trai...
متن کاملLearning Transformational Invariants from Natural Movies
We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After trai...
متن کاملImplications of News Segments and Movies for Enhancing Listening Comprehension of Language Learners
Abstract Armed with technological development, the present study aimed at gauging the effectiveness of exposure to news and movies as two types of audiovisual programs in improving language learners’ listening comprehension at the intermediate level. To this end, a listening comprehension test was administered to 108 language learners and finally 60 language learners were selected as intermedia...
متن کاملReconstructing Visual Experiences from Brain Activity Evoked by Natural Movies
Quantitative modeling of human brain activity can provide crucial insights about cortical representations [1, 2] and can form the basis for brain decoding devices [3-5]. Recent functional magnetic resonance imaging (fMRI) studies have modeled brain activity elicited by static visual patterns and have reconstructed these patterns from brain activity [6-8]. However, blood oxygen level-dependent (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural computation
دوره 24 4 شماره
صفحات -
تاریخ انتشار 2012